For example,Бобцов

Machine learning of the Bayesian belief network as a tool for evaluating the process frequency on social network data

Annotation

The paper considers the problem of evaluating frequency of the processes whose mathematical model is stochastic processes consisting of a series of sequential episodes with a known class of distributions of the length of the time interval between them. In the previously proposed approach, the input data included information about the value of the interval between the last episode and the end of the study period, which could lead to inaccurate results. This interval differs from the intervals between successive episodes, and hence its presentation and processing require approaches that take this feature into account. Accuracy of the estimation results for process frequency was improved by developing a new model based on the Bayesian confidence network that includes nodes corresponding to the intervals between the last episodes of the process, the minimum and maximum intervals between episodes, by correctly accounting for the values of the interval between the last episode and the end of the study period at the model training stage. The authors propose a Bayesian belief network that includes a random element characterizing the interval between the end of the study period and the last episode of the process during the study period; data on this interval can be available at the training stage. They used R programming and the bnlearn package to model the Bayesian belief network. A new approach to the estimation of process frequency based on the Bayesian belief network generated by machine learning methods is proposed. It allows increasing the accuracy of the results by correctly considering the value of the interval between the last episode and the end of the period under study using a special scheme in the machine learning Bayesian belief network which includes a “hypothetical” episode after the end of the study period. To test the proposed approach, data was collected on 5608 Instagram users, which included the time of posting for the year 2020 and the time of publishing the first post for the year 2021. 70 % of the sample was used to train the model, and 30 % was used to compare the posting frequency values predicted by the model with known values. The results can be used in various fields of science, where it is necessary to estimate a process frequency under information deficit, when the whole process is observed for no more than some limited time. Obtaining such estimates is often an important issue in medicine, epidemiology, sociology, etc. The approach shows good results on the comparison of the theoretical model and the results of learning from the social network data, which can automate the obtaining of process frequency estimates.

Keywords

Articles in current issue